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DIFFICULTIES  WITH  REGRESSION  ANALYSES  OF  AGE-ADJUSTED  RATES 


Paul  R.  Rosenbaum*  and  Donald  B.  Rubin** 

1 .  Introduction:  A  Common  Type  of  Observational  Study 

A  common  and  inexpensive  type  of  observational  study  uses  previously 
collected  population  data,  such  as  census  data,  to  assess  the  effects  of 
policies  which  are  specific  to  certain  counties,  states  or  nations •  An 
example  is  the  comparison  of  motor  vehicle  mortality  rates  in  states  with 

and  without  required  automobile  inspection,  (Fuchs  and  Leveson  1967} 

* 

Colton  and  Buxbaum  1968).  Note  that  in  this  example,  all  people  living 
in  the  same  state  are  subject  to  the  same  law. 

A  related  though  distinct  type  of  observational  study  involves  an 
exposure  or  treatment  that  is  more  prevalent  in  some  states  than  in 
others:  the  relationship  between  the  extent  of  exposure  and  the  outcome 
is  studied  in  an  effort  to  assess  the  effectB  of  exposure.  Examples 
include  (a)  studies  which  examine  site-specific  cancer  mortality  rates  in 
various  counties  and  their  relationship  to  environmental  factors  in  these 
counties  (e.g.,  Blair,  Fraumeni,  and  Mason  1980)  and  (b)  studies  of  the 
socioeconomic  correlates  of  mortality  (e.g.,  Kitagawa  and  Hauser  1973). 
Our  discussion  here  is  relevant  to  both  types  of  studies,  and 
demonstrates  that  standard  analyses,  such  as  those  in  the  above 
references,  are  generally  inappropriate.  The  problem  arises  because  the 
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outcotM  variables  used  in  those  analyses,  such  as  death  rates  in  various 
states#  have  been  age  adjusted#  whereas  the  predictor  variables  have  not 
been  age  adjusted.  The  use  of  crude  state  death  rates  as  the  outcome 
variable  with  crude  covariates  and  age  as  predictors  can  avoid  the 
problem#  at  least  under  some  simple  linear  models.  The  use  of  age- 
specific  rates  as  the  outcome  variable  is  generally  inappropriate  unless 
age-specific  predictors  are  used. 

2.  A  Motivating  Simple  Case:  Age  Adjustment  By  Regression 

Suppose  we  wish  to  estimate  the  regression  coefficient  g 

*1  **2 

of  Y  on  in  the  multiple  regression  with  two  predictors,  X1  and 
X2*  Zt  is  well  known  that  the  least  squares  estimate  of  this  coefficient 
may  be  found  by,  first,  regressing  Y  on  X2  and  calculating  the 
residuals  Y»X2#  then  regressing  Xj  on  X2  and  calculating  the 
residuals  X  »x  ,  and  finally  calculating  the  estimate  of  0  as 

1  2  YXj  »x2 

the  estimated  slope  in  the  regression  of  the  first  set  of  residuals 
Y*X2  on  the  second  'X^.  An  example  is  given  by  Hosteller  and  Tukey 
(1977#  p.271 ) >  the  formal  argument  is  given  by  Seber  (1977,  p.65).  This 
process  of  "sweeping  out"  one  variable  at  a  time  forms  the  basis  for 
several  of  the  algorithms  used  for  multiple  regression,  particularly  the 
Gaussian  pivoting  in  Beaton's  sweep  operator  (Dempster  1969#  p.62). 

We  can  now  give  a  rough  description  of  the  difficulty  with  the 
regression  analysis  of  age-adjusted  rates;  the  argument  is  formalized  in 
the  next  section.  Suppose  that  Y  is  an  age  and  state  specific 
mortality  rate,  that  X2  is  the  corresponding  age,  and  that  X^  is  any 
variable  that  varies  with  both  age  and  state,  say  X1  -  per  capita 
personal  income.  Roughly  speaking,  Y*X2  is  the  age-adjusted 
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mortality.  To  find  the  least  squares  estimate  of  0  we  should 

1  2 

regress  age-adjusted  mortality  Y*X^  on  age-adjusted  incoi&e  X^ *X2* 

However,  that  is  not  what  is  often  mistakenly  done;  rather  age-adjusted 

mortality  Y*X2  is  regressed  on  income  X^ ,  giving  a  biased  estimate 

unless  incase  X.,  and  age  X2  are  orthogonal.  The  point  is:  if  we 

adjust  mortality  for  age,  we  must  adjust  the  covariates  for  age  as  well. 

Although  age-adjusted  mortality  rates  are  commonly  available,  it  is 

uncommon  to  find  covariates  such  as  income  that  have  been  age  adjusted 

before  tabulation.  If  the  available  data  consist  of  adjusted  mortality 

rates  and  unadjusted  per  capita  income  for  each  state,  we  cannot 

generally  adjust  income  for  age,  and  therefore  cannot  determine  the 

partial  regression  coefficient  of  mortality  on  income  adjusting  for  age. 

An  alternative  solution  would  be  to  regress  adjusted  mortality 

Y«X2  on  crude  per  capita  income  X1  and  crude  age  X2,  when  the  age 

information,  X2,  is  available.  It  is  easily  shown  that  the  coefficient 

of  incase  in  this  regression  is  the  usual  unbiased  least  squares  estimate 

of  0  .  Unfortunately  this  procedure  is  not  generally  applicable  to 

”  *X2 

age-adjusted  rates,  for  reasons  described  in  5 5  below. 

Regression  Analysis  of  Adjusted  Rates 

Let  Ya8£  b«  the  response  of  the  itl1  person  with  age  a  in  state 
s,  for  i  “  1,2,...,nas.  For  purposes  of  this  discussion,  we  assume  the 
following  linear  model  for  Yasi  which  includes  polynomial  terms  in  age: 

*l,..i|DI  ■  *  ♦  y1  *  “..i  *  it!s,  *  i\.L  <» 

for  i  -  1,2,...,nas,  a  *  1 ,2, . . . ,A  s«1,2,...,S, 
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where 


Za8^  *  1  if  individual  i  was  exposed  to  the  treatment  and 

0  otherwise , 

Xg  is  a  vector  of  characteristics  of  state  s  (e.g.,  minimum 

driving  age  in  the  state) 

is  a  vector  of  characteristics  of  the  individual  (eg,  income, 
marital  status),  excluding  features  of  the  state  as  a  whole 
since  these  are  included  in  X  ,  but  possibly  including 
characteristics  which  are  constant  for  all  members  of  certain 
counties  (e.g./  source  of  drinking  water/  city  supplied  vs. 
private  well), 

0,0^ ,02,« ., ,0j.  A,  j,  £,  are  parameters,  and  D  is  short-hand  for  the 
age  information  and  all  the  Z's,  X' s  and  W's.  The  polynomial  in  age 
can  be  replaced  by  other  linear  structures  such  as  an  indicator  variable 
for  each  age  or  age  category,  a  polynomial  in  the  logarithm  or 
exponential  of  age,  or  a  combination  of  a  polynomial  in  age  and  indicator 
variables  for  extreme  age  categories. 

If  ?asi  is  binary,  the  linear  logistic  model  (Cox  1970)  is  more 
attractive  than  the  linear  model  for  most  purposes/  however,  the  logit 
model  does  not  lead  to  straightforward  conclusions  about  the  common 
practice  of  linearly  regressing  adjusted  rates  on  predictors,  nor  would 
use  of  the  logit  model  eliminate  the  problems  that  we  describe  which 
result  from  the  use  of  age-adjusted  rates. 
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The  age  end  state  epecific  mean  response  (or  rate  if  Yaai  is 
n 

i  as  _ 

binary)  is  Yag+  -  - —  l  *a>i  .  By  ( 1 ) ,  the  expectation  of  Y#a+  is 

as  i*1 


■»  - 

E(Y  |D)  —  tt  +  E  M  +  AZ 
aB+  j-1  j 


aB+  ♦  x\  + 


(2) 


where  Z#a+  and  Wftg+  are  averages  of  the  Zaai  and  the  *asi» 

respectively.  Clearly,  the  parameters  of  model  (1)  may  be  estimated  from 

a  suitable  weighted  regression  of  the  age  and  state  specific  rates 

on  the  age  and  state  specific  averages  in  (2).  For  example,  if  the 

conditional  variances  given  D  of  the  Yaai's  are  all  equal  to  a  common 
2 

value  o  ,  and  if  the  Yaai's  are  condi tionaly  uncorrelated,  the 

appropriate  weight  for  Yfta+  in  regression  model  (2)  is  nafl.  Other 

choices  for  weights  are  described  by  Pocock,  Cook  and  Beresford  (1981). 

Mow  consider  the  crude  unadjusted  rates  for  state  s,  namely 

Y+a+  -  <£  naaY a  )/< £  nas) ,  with  expectations 
a  a 

E(Y+a+,D)  -  “  +  Sj-.j  +  Ai+#++  x\  *  (3) 

where  Z+fl+  and  W( are  averages  of  Za>i,  over  all  individuals 

in  state  s,  and  m  .»  -  (J  n  a^)/(£  n  )  is  the  jth  moment  of  age  in 

a  a 

state  s.  Zf  the  first  J  moments  of  the  age  distribution  are  available 
from  each  state,  then  the  parameters  of  model  (1)  may  be  estimated  by  a 
suitable  weighted  regression  of  the  crude  rates  Y+#+  on  the  crude 
predictors  (m  j-1,...,  J;  £  ,5.  . )  for  the  states.  For 

Sj  ▼*T  '“Tit 

example,  under  the  simple  assumption  of  the  previous  paragraph,  the 
weight  for  Y.  .  would  be  the  population  of  state  s,  namely  £  n  . 
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In  practice,  the  momenta  m8_,  of  age  distributions  may  be 
approximated  from  frequency  tabulations  of  age  distributions  for  each 
state,  using,  for  example,  the  EM  algorithm  of  Dempster,  Laird  and  Rubin 
(1977)  to  correct  for  grouping.  If  a  linear  structure  other  than  a 
polynomial  is  used  for  age  in  ( 1 ) ,  then  the  corresponding  averages  would 
appear  in  (3).  For  example,  if  indicators  are  used  for  each  age 
category,  then  the  proportion  of  individuals  in  each  age  category  in  each 
state,  p  ■  n  /[  n  ,  would  appear  in  (3). 

AS  AS  AS 

a 

Now  consider  the  age-adjusted  rates 


+s+ 


l  f  * 

L  a  i 


as+ 


where  f_  is  the  fraction  of  the  reference  population  with  age  a.  Note 
that  the  same  weights  fa  are  applied  in  all  states.  For  example,  the 
total  population  age  distribution  might  be  used  as  weights,  so  that 


+s+' 


».  .  .  Now, 

TT 

J 

-  <*  +  l 

8  I  faaJ  + 

A  7  f  z  .  +  yTx  + 

*•  a  as+  *  ~s 

A 

j-1 

3  a  a 

J 

■  o  +  I 

j-1 

Bj“j  +  ^+8+  + 

i\  * 

£T  l  f  W 


A'-AS* 


*v  f**  T  ITW 

-  « ♦  «♦.*  *  i\*  <*> 

say,  where  m^  is  the  j  moment  of  age  in  the  reference  population, 
and  Z+>+  and  wjt are  the  aqe-adlusted  averages  of  Z  and  W  for 

^  me 

state  a.  Note  that  the  constant  a  includes  the  age  component,  • 

which  is  the  same  for  all  states;  this  would  be  true  no  matter  what 
linear  structure  is  assumed  in  (1)  for  the  regression  on  age. 
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Equation  (4)  formally  describes  the  difficulty,  mentioned  in  the 
last  section,  that  is  encountered  when  age-adjusted  rates  are  regressed 
on  predictors.  To  estimate  the  parameters  of  the  model  (1),  we  must 
regress  the  adjusted  rates  Y  on  the  age- adjusted  treatment  indicator 
Z+#+  ,  the  age- adjusted  covariates  W+>a+,  and  X^.  Note  that  there  is 
no  difficulty  when  both  (a)  treatment,  zasi'  is  constant  within  a 
state,  as  is  the  case  when  Z  represents  a  state  law,  and  (b)  the  only 
covariates  involved  are  the  descriptors  ^  of  the  state  as  a  whole, 
such  as  other  state  laws  or  policies.  However,  there  is  a  difficulty  if 
there  are  covariates  HaB±  such  as  personal  income  that  describe 
individuals  within  a  state,  or  when  there  are  covariates  such  as 
pollution  levels  that  describe  areas  within  a  state,  because  in  such 
cases  age-adjusted  income  or  pollution  levels  are  required  to  fit 
equation  (4),  and  these  quantities  are  rarely  tabulated  in  official 
publications.  Moreover,  the  difficulty  also  occurs  if  treatment, 

Z varies  within  a  state,  for  in  such  cases,  the  age-adjusted  rate 
Y+>+  should  be  regressed  on  age-adjusted  exposure  Z+g+  . 

Although  age-specific  death  rates,  Y_  ,  may  be  available,  it  is 
often  difficult  to  obtain  age-specific  predictors  ^as+»  ^as+**  *'“• 


a  result,  another  common  practice  is  to  regress  age-specific  rates  Y 
on  crude  predictors  (Z+g+,  X^,  Wr)  a) ) .  An  example  is  a  study  of  the 


as+ 


association  in  18  countries  between  wine  consumption  and  cardiovascular 
mortality  among  men  and  women  aged  55  to  64  (St,  Leger,  Cochrane,  and 
Moore  1979),  However,  inspection  of  equation  (2)  shows  that  this 
procedure  is  generally  inappropriate,  unless  the  age-specific  predictors 

t5as+'  V  ^as+)  equal  th*  orud*  prBdlctor“  (z+.+'  V  ^•s+)* 
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This  section  presents  an  example  to  illustrate  the  problem  described 

in  {3.  The  data  used  are  a  mixture  of  real  and  artificial  data,  because 

the  true  values  for  the  age-adjusted  covariates  were  not  available,  and 

we  wished  to  dramatize  possible  effects.  As  a  result,  although  the 

studies  from  which  the  data  were  drawn  may  have  been  affected  by  the 

problems  we  describe,  our  numerical  results  do  not  necessarily  contradict 

the  qualitative  conclusions  of  those  studies. 

Table  1  contains  (a)  age-adjusted  motor  vehicle  accident  mortality 

rates  (Y,  )  for  white  males  in  1960  for  the  48  contiguous  states  of 

the  united  States,  (b)  a  variable  Z.  .  indicating  whether  the  state 

requires  motor  vehicle  inspections,  (c)  the  percent  of  the  state  living 

in  urban  areas  5..  ,  and  (d)  the  (artificial)  age-adjusted  percent  of 
+8+ 

urbanization,  W.. .  Since  the  state  law  affects  everyone  in  a  state,  the 

inspection  indicator  is  not  altered  by  age-adjustmenti  i.e.,  z+g+  »  z+s+* 

Presumably,  an  individual's  risk  of  accident  mortality  (e.g. , 

prob(Yasi  ■  1),  say),  depends  less  on  the  statewide  degree  of 

urbanization  W,  than  on  whether  the  individual  himself  lives  in  an 
+s+ 

urbanized  area  (i.e.,  whether  Hasi  *  1,  say).  For  example,  an 
individual  living  outside  Massena,  Mew  York,  far  from  Manhattan,  may  be 
no  more  affected  by  the  high  percent  of  urbanization  in  Mew  York  State 
than  are  residents  of,  say,  Vermont.  If  the  age  distributions  in  urban 
and  rural  areas  differ,  then  W.  .  and  w  will  generally  differ, 

tit  tat 

generally  leading  to  a  biased  estimate  of  the  coefficient  of  automobile 
inspection  Z+a+  when  adjusted  mortality  Y+a+  is  regressed  on  Z+>+ 
and  crude  urbanisation  M. • 

TfT 
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TABLE  1.  Data  For  The  Example i  Mortality  and  Motor  Vehicle  Inspections 


State 

Age-adjusted 

Inspection 

Percent 

Age-adjusted 

Motor  Vehicle 

State* 

Urban** 

Percent 

Mortality* 

(1  -  yes) 

Urban*** 

_  (0  -  no) 

Y 

Z  -  2 

w 

W 

+S+ 

+S+  +S+ 

+S+ 

+S+ 

1 

57.5 

0. 

26.7 

26.7 

2 

57.7 

0. 

50.1 

50.1 

3 

56.2 

0. 

13.4 

13.4 

4 

47.7 

0. 

35.2 

35.2 

5 

21.0 

0. 

34.4 

34.4 

6 

40.9 

0. 

25.6 

25.6 

7 

51.1 

0. 

24.1 

24.1 

8 

52.6 

0. 

.0 

.0 

9 

31.3 

0. 

42.1 

42.1 

10 

47.7 

0. 

30.0 

30.0 

11 

43.0 

0. 

22.0 

22.0 

12 

44.6 

0. 

17.1 

17.2 

13 

53.8 

0. 

16.0 

16.0 

14 

49.0 

0. 

32.5 

32.5 

15 

30.3 

0. 

30.3 

30.3 

16 

40.7 

0. 

32.9 

32.9 

17 

41.2 

0. 

27.1 

27.1 

18 

66.5 

0. 

6.6 

6.6 

19 

47.9 

0. 

32.4 

32.4 

20 

62.4 

0. 

16.0 

16.0 

21 

39.8 

0. 

30.5 

30.5 

22 

95.3 

0. 

40.6 

40.6 

23 

53.0 

0. 

16.0 

16.0 

24 

55.5 

0. 

7.4 

7.4 

25 

38.0 

0. 

35.2 

35.2 

26 

49.9 

0. 

27.8 

27.8 

27 

50.3 

0. 

24.0 

24.0 

28 

55.4 

0. 

9.6 

9.6 
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swum  >  ■  •  tmi 


W»PW*P«U 


HP 


State 

Age-adjusted 
Motor  Vehicle 
Mortality* 

Inspection 

State* 

(1  ■  yes) 

(0  -  no) 

Percent 

Urban** 

Age-adjusted 

Percent 

Urban*** 

*+s+ 

Z  -  Z 

+S+  +S+ 

+S+ 

+  S+ 

29 

62.4 

0. 

9.6 

9.6 

30 

45.0 

0. 

25.5 

25.5 

31 

35.5 

0. 

31.1 

31.1 

32 

47.6 

0. 

28.4 

28.4 

33 

96.0 

0. 

.0 

.0 

34 

49.9 

1. 

37.4 

67.4 

35 

37.5 

1. 

21.5 

51.5 

36 

29.6 

1. 

14.2 

44.2 

37 

21.0 

1. 

34.7 

64.7 

38 

37.4 

1. 

14.5 

44.5 

39 

20.9 

1. 

18.7 

48.7 

40 

79.1 

1. 

21.2 

51.2 

41 

23.2 

1. 

55.8 

85.8 

42 

28.1 

1. 

31.1 

61.1 

43 

13.4 

1. 

33.6 

63.6 

44 

47.7 

1. 

46.3 

76.3 

45 

42.4 

1. 

35.3 

65.3 

46 

51.4 

1. 

.0 

30.0 

47 

35.7 

1. 

25.1 

55.1 

48 

42.  ♦ 

1. 

13.5 

42.5 

*  From  Colton  and  Buxbaum  (1968).  Rate  Is  for  white  males  in  1960, 
adjusted  to  the  total  population  age  distribution  in  1960. 

'*  From  Kitagawa  and  Hauser  (1973) 


***  Artificial.  W.  -  M  .  +  30Z  . . 

t|T  t|T  Tgt 


Table  2  summarizes  the  results  of  (a)  regressing  ?+,+  on 
and  W+#+  and  (b)  regressing  Y+s+  on  Z+>+  and  W+#+.  For  purposes  of 
illustration  only,  no  attention  has  been  paid  to  the  Important  questions 
of  weighting  the  rates  (Pocock,  Cook  and  Beresford  1981)  or  to  regression 
diagnostics  (Draper  and  Smith  1966,  chapter  3}  Seber  1977,  section  6.6). 
By  construction  of  the  age-adjusted  urbanization  variable,  the  two 
estimates  of  the  coefficient  of  inspection  differ  markedly!  with  age- 
adjusted  covariates,  the  coefficient  is  positive!  with  unadjusted 
covariates,  the  coefficient  is  significantly  negative. 
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:hnical  issues 


5.1  A  formal  Expression  for  the  Bias  of  the  Estimator  of  A 

We  now  obtain  an  expression  for  the  bias  that  results  from 


regressing  adjusted  mortality  Y 


and  W,  . 
~fs+ 


-  T  -T 

1  Z+1+  W+1  + 

-  .  T  -T 

v  -  1  Z^  X*  w“_ . 

~  +2+  ~2  "42+ 


-  T  -T 

1  Z^. 

+S+  ~s  ~+-s+ 


on  crude  predictors  Z+s+/  , 


~  <r  <yp 

1  WT.x. 

+1+  ~+1  +  ~+1  + 

17  XT  W1 

+2+  *+-2+  2+2+ 


1  Zx«X  L«x  Lbx 

+S+  ~+S+  — +S+ 


and  let  8T  -  (a,  A,  xT •  £)•  Moreover,  let  Y  -  (Y+1+,  Y+2+, . . . ,Y+g+)T 
For  any  full  rank  matrix  £  that  will  be  used  to  weight  the  adjusted 
mortality  rates,  the  estimator  8  ■  (V  JJf)  V  £Y  that  results  from 
regressing  adjusted  Y+#+  on  adjusted  covariates  Z+g+,  S+8+'  is 


unbiased  for  8  since  E(Y|D)  “  V8.  However,  the  estimator 
1  *  (VT<K)  ^x"1* that  results  from  regressing  adjusted  Y+g+  on  crude 


covariates  Z .  . ,  X  ,  W.  . ,  has  bias 
+8+  ~s  ~+s+ 

K<1  -  £»*»  -  nvTav)'1vrgv  -  i]  e 


where  I  is  the  identity  matrix.  Let  t  ■  (t4,...t  )  be  the  second 
**  ^18 

row  of  (V^gV)  Vy;  the  bias  in  the  estimator  of  A  from  8  is 

t  0  -  A,  and  so,  as  we  would  expect,  the  bias  in  the  estimator  of  A 

is  affected  by  all  the  variables. 

If  Za>i  is  constant  within  each  state,  as  in  the  case  of  a  state 

law,  then  t^  is  the  mean  difference,  in  the  ith  column  of  V, 


«. 


between  etetee  with  the  law  <Z ..  -  1)  and  states  without  the  law 

▼IT 

(Z  .  ■  0)  after  covariance  adjustment  for  X  and  5  . .  For 

r|+  ▼ 

instance,  in  the  example  in  $4,  t3  is  the  mean  difference  between 
inspection  and  noninspection  states  in  age-adjusted  urbanisation  after 
covariance  adjustment  for  crude  urbanization. 


5.2  Properties  of  an  Alternative  Estimator 

An  alternative  estimator,  mentioned  at  the  end  of  $2,  involves 
regressing  adjusted  mortality  *+8+  on  crude  predictors  Z+a+,  X^,  W+s+ 
and  age.  Age  may  be  represented  either  by  moments  of  the  age 
distributions  within  the  states,  m0j,  or  by  the  proportions  pgg  of 
people  in  state  a  with  age  a.  From  (4)  we  have 


E<r+>+|D)  -  S  ♦  *  l\  *  k\,+  *  -  2lt+) 


(6) 


where  the  pag's  have  zero  coefficients  since  the  expectation  in  (4), 
which  is  conditional  on  all  the  age  information  in  D,  does  not  depend 

**  mm  M  «• 

on  age.  If  the  differences  (Zx  .  -  Z.  . )  and  (W  .  -  W  )  can  be 

written  as  linear  functions  of  the  p  *s,  then  (6)  can  be  rewritten 

as+ 


♦  i\  ♦  i\„  *  I  *j„  <n 

M# 

for  some  parameters  a  and  a  •  1,2,..., A?  in  this  case,  the 

alternative  estimator  leads  to  unbiased  estimates  of  &. 

The  differences  <Z+#+  -  z+#+)  and  will  indeed  be 

linear  functions  of  the  proportions  p _ .  if  the  age-specific  regressors 
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can  be  written  aa  the  sun  of  an  age  and  a  state 


ma 

component, 


W 

~as+ 


i.e.  if 


and 


as* 


b  +  r 

a  • 


W  .  -  u  +  v 
~as+  ~a  ~a 


(6) 


for  toae  scalars  aa  and  rg,  and  some  vectors  and  v^,  for  all 

a  and  a.  1(5  is  average  income  in  state  s  at  age  a,  and  (8) 
is  true,  then  the  difference  in  average  income  between  Mew  York  and 
Virginia,  say,  is  the  same  at  all  ages*  To  see  that  (8)  implies  the 
required  linear  dependence,  note  that 


-  *♦»  - 1  *.*.<*.  -  w> 

•  I  <*.  *  '  *..♦>  (9> 

-  <1  V.>  -  1 1  v„»> 

since  I  f  •  I  p**+  *  1 '  A*  required,  (9)  is  a  linear  function  of  the 

a  a 

p,  . ’a.  Analagous  arguments  apply  to  the  W  ’a. 

The  condition  that  (Z,^  -  Z.. )  and  (V  .  -  Y  1  must  be 

T^T  tgt  **TfT  ^IT 

linear  functions  of  the  proportions  p  .  is  quite  restrictive.  Even 
random  deviations  from  linear  dependence  would  constitute  errors  in  the 
predictor  variables,  leading  to  biased  estimates  by  analogy  with  standard 
arguments  (e.g.  Saber  1977,  p.155»  Johnston  1972,  p.281). 
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6.  Su— rv 

We  have  considered  the  following  seven  procedures: 


(a) 


(b) 

(c) 

(d) 

(e) 

(f) 

(9) 


Regression  of  the  responses  of  individuals,  Yasi'  on  t*ie  a9e 
individuals  and  the  predictors  (Z^^,  ^asi^  describing 
individuals. 

Weighted  regression  of  the  age-specific  response  rates  Y 


as+ 


on  the 


age-specific  predictor  averages  (Z  •  X  ,  w  , ) . 

as+  ~s  ~as+ 


Weighted  regression  of  the  crude  response  rates  Y+g+  on  the  crude 
predictor  averages  (Z+g+,  Wfg+) . 

weighted  regression  of  the  age-adjusted  rates  Y+g+  on  the  age- 

adjusted  predictors  (z.  . ,  X  ,  V  , ) . 

+8+  ~8  ^8+ 

Weighted  regression  of  age-adjusted  rates  Y+g+  on  age  and  crude 
predictors  (Z+#+,  X^,  W,a+>. 

Weighted  regression  of  age-adjusted  rates  Y+g+  on  crude  predictors 
(ZA  .,  X  ,  w>. 

+8+  ~S  ~+S+ 

Weighted  regression  of  age-specific  rates  Y 

48+ 


on  crude  predictors 


Under  the  simple  linear  model  for  (a),  that  is  equation  (1),  methods 
(a)  through  (d)  yield  unbiased  estimates  of  the  parameters  of  the  model: 
however,  the  data  required  for  methods  (a),  (b),  and  (d)  are  often 
unavailable  in  official  tabulations.  The  crude  rates  required  for  (c) 
are  available  in  some  but  not  all  official  tabulations:  for  example, 
homicide  death  rates  are  rarely  age-adjusted,  whereas,  coronary  disease 
mortality  rates  are  usually  age-adjusted.  Method  (e)  can  yield  unbiased 
estimates  under  restrictive  assumptions  defined  in  §5.2.  Methods  (f)  and 
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(g),  although  popular  tachniquas  in  practica,  do  not  ganarally  lead  to 
unbiased  estiaatas  undar  tha  linaar  nodal  for  (a). 
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